-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
p2p: re-check after sleeps #2664
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #2664 +/- ##
===========================================
- Coverage 62.37% 62.32% -0.05%
===========================================
Files 212 212
Lines 17210 17215 +5
===========================================
- Hits 10734 10729 -5
- Misses 5578 5585 +7
- Partials 898 901 +3
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Would be great if you answer my questions and provide logs with reconnect attempts (if possible).
p2p/switch.go
Outdated
@@ -325,6 +325,11 @@ func (sw *Switch) reconnectToPeer(addr *NetAddress) { | |||
return | |||
} | |||
|
|||
if sw.IsDialingOrExistingAddress(addr) { | |||
sw.Logger.Info("Peer connection has been established or dialed while we waiting next try", "addr", addr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
p2p/pex/pex_reactor.go
Outdated
@@ -395,6 +394,9 @@ func (r *PEXReactor) ensurePeers() { | |||
if r.Switch.IsDialingOrExistingAddress(try) { | |||
continue | |||
} | |||
if r.Switch.NodeInfo().ID() == try.ID { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you decide to add this check? If we're picking our address for connecting in ensurePeers
, then something is wrong. That's why we add code like
Line 397 in f94eb42
if !netAddr.Same(ourAddr) { |
and
Lines 458 to 459 in f94eb42
// Add ourselves to addrbook to prevent dialing ourselves | |
addrBook.AddOurAddress(nodeInfo.NetAddress()) |
it's a little messy, but the point is ensurePeers should only use external addresses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. The protocol part of the listen address definitions in the configuration caused my problem.
If the listen address looks like tcp://995f83b21e6b66b62883853cc77fe567da48986c@0.0.0.0:2101
the id part can not be resolved in node_info.
So it's caused to unable to check our address.
@mgurevin Hi! Could you follow up on my questions? We want to merge this, but #2664 (comment) needs to be addressed before. |
Hi @melekes, I saw your comment. I will split my additions and send their detailed logs in a few days, sorry for delay. |
Hi again, I have split my additions into different commits and attach their detailed logs. My setup is three node and they defined as persistent nodes in the configuration files. Log files;
|
This reverts commit 0c6e0fc.
I have complete my pull request, if it looks good you can merge. The main problem is the listen address definition because node ID cannot be resolved if it contains the protocol part. In this pull request, we fixed this issue and add check the actual connection status when reconnecting after sleep. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🍓 🍌 🥛
Co-Authored-By: mgurevin <mehmet@gurevin.net>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this. I'll merge develop in, fix the conflicts, and merge this PR
p2p/netaddress.go
Outdated
func IDAddressString(id ID, hostPort string) string { | ||
// we respect the protocol definition in here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We merged a similar fix in https://github.com/tendermint/tendermint/pull/2797/files#diff-91e8069d5cb91a5b3067a01482559679R38. There we just drop the protocol prefix, since we shouldn't need it anymore. Sorry didn't see this earlier, it would have saved us some time.
@@ -328,6 +328,11 @@ func (sw *Switch) reconnectToPeer(addr *NetAddress) { | |||
return | |||
} | |||
|
|||
if sw.IsDialingOrExistingAddress(addr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. These changes are partial fixes for #2716.
Note the races are still possible, but this should help significantly. We'll likely have to rearchitect the code a bit to solve this fully.
Thanks!
* 'master' of https://github.com/tendermint/tendermint: (330 commits) Release/v0.26.1 (tendermint#2803) fix amino overhead computation for Tx (tendermint#2792) p2p: re-check after sleeps (tendermint#2664) check the result of `ps.peer.Send` before calling `ps.setHasVote` (tendermint#2787) p2p: AddressBook requires addresses to have IDs; Do not close conn immediately after sending pex addrs in seed mode (tendermint#2797) test AutoFile#Size (happy path) [autofile/group] do not panic when checking size openFile creates a file if not exist => ErrNotExist is not possible use our logger in autofile/group Add tests for ValidateBasic methods (tendermint#2754) [docs] improve organization of ABCI docs & fix links (tendermint#2749) p2p: peer-id -> peer_id (tendermint#2771) mempool: print postCheck error (tendermint#2762) Fix crypto/merkle ProofOperators.Verify to check bounds on keypath pa… (tendermint#2756) Mempool WAL is still created by default in home directory, leads to permission errors (tendermint#2758) mempool: ErrPreCheck and more log info (tendermint#2724) Release/v0.26.0 (tendermint#2726) [ADR] [DRAFT] pubsub 2.0 (tendermint#2532) validate reactor messages (tendermint#2711) TMHASH is 32 bytes. Closes tendermint#1990 (tendermint#2732) ...
Hi there, I saw there are many re-connection attempts and trying to connect itself while ensuring peers in pex reactor. I think that issues appear for me because I'm testing three node setup with persistent peers. So I guess this patch is the solution. It works for me.